pr: uniformly scan for form feed and newline chars #10303

jfinkels · 2026-01-18T04:15:42Z

Fix the way form feed characters are interpreted by changing the way
lines and pages are found. Before this commit, a file comprising two
form feed characters (/f/f) would result in too few trailing newlines
at the end of the second page. After this change, each page is produced
with the correct number of lines.

This merge request changes the way files are read, replacing complex iterators
with a loop-based approach, iteratively scanning for newline or form
feed characters. The memchr library is used to efficiently scan for
these two characters. One downside of this implementation is that it
currently reads the entire input file into memory; this can be improved
in subsequent merge requests. The behavior of uutils pr is still quite different from GNU pr, so hopefully this is acceptable as a temporary debt to make many more straightforward improvements.

This merge request also removes a unit test that was enforcing incorrect behavior. These
tests were ostensibly designed to match corresponding ones in the GNU
test suite, but most of them don't match the current behavior of GNU pr, so it
does not seem valuable to keep them. As we improve uutils pr, we can add more targeted and minimal test cases to replace them.

codspeed-hq · 2026-01-18T04:22:09Z

Merging this PR will degrade performance by 3.26%

⚡ 7 improved benchmarks
❌ 1 regressed benchmark
✅ 274 untouched benchmarks
⏩ 38 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`sort_key_field[500000]`	51.8 MB	45.8 MB	+12.95%
⚡	Memory	`sort_accented_data[500000]`	28.3 MB	22.1 MB	+27.85%
⚡	Memory	`sort_long_line[160000]`	982.3 KB	724.4 KB	+35.59%
⚡	Memory	`sort_ascii_only[500000]`	28.3 MB	22.2 MB	+27.78%
⚡	Memory	`sort_case_sensitive[500000]`	17.4 MB	16.9 MB	+3.03%
⚡	Memory	`sort_mixed_data[500000]`	27.2 MB	22.5 MB	+20.7%
❌	Memory	`dd_copy_partial`	129.1 KB	133.5 KB	-3.26%
⚡	Memory	`cp_large_file[16]`	119.5 KB	112.6 KB	+6.13%

_{Comparing jfinkels:pr-simplify (aedca01) with main (0e50947)}

38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2026-01-18T04:34:06Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

github-actions · 2026-01-18T04:56:32Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

Remove unit test for `pr` that was enforcing incorrect behavior. These tests were ostensibly designed to match corresponding ones in the GNU test suite, but they don't match the current behavior of GNU `pr`, so it is not valuable to keep them.

Fix the way form feed characters are interpreted by changing the way lines and pages are found. Before this commit, a file comprising two form feed characters (`/f/f`) would result in too few trailing newlines at the end of the second page. After this change, each page is produced with the correct number of lines. This commit changes the way files are read, replacing complex iterators with a loop-based approach, iteratively scanning for newline or form feed characters. The `memchr` library is used to efficiently scan for these two characters. One downside of this implementation is that it currently reads the entire input file into memory; this can be improved in subsequent merge requests.

github-actions · 2026-01-18T15:18:03Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)

sylvestre · 2026-01-18T19:25:26Z

src/uu/pr/src/pr.rs

+    // TODO Read incrementally.
+    let buf = if path == "-" {
+        let mut f = stdin();
+        let mut buf = vec![];
+        f.read_to_end(&mut buf)?;
+        buf
+    } else {
+        std::fs::read(path)?
+    };


duplicated code with get_file_line_groups
please fix it in a different pr

sylvestre · 2026-01-18T19:25:51Z

src/uu/pr/src/pr.rs

+    ) -> Result<Self, FromUtf8Error> {
+        // TODO Don't read bytes to String just to directly write them
+        // out again anyway.
+        let line_content = String::from_utf8(buf.to_vec())?;


it doesn't work with buff directly ?

I was trying to accommodate some existing code in this file that expects a String. It should be okay to just use the bytes, I will fix that in a new merge request. Thanks

jfinkels added 2 commits January 18, 2026 09:48

pr: remove inaccurate unit test

f7f291d

Remove unit test for `pr` that was enforcing incorrect behavior. These tests were ostensibly designed to match corresponding ones in the GNU test suite, but they don't match the current behavior of GNU `pr`, so it is not valuable to keep them.

jfinkels force-pushed the pr-simplify branch from 00ce6ba to aedca01 Compare January 18, 2026 15:01

jfinkels changed the title ~~pr: simplify the implementation of reading pages~~ pr: uniformly scan for form feed and newline chars Jan 18, 2026

jfinkels marked this pull request as ready for review January 18, 2026 15:44

This was referenced Jan 18, 2026

pr: ignore empty line after form feed char #10331

Draft

pr: ignore empty line after newline char #10332

Draft

sylvestre reviewed Jan 18, 2026

View reviewed changes

sylvestre merged commit 1a81d1b into uutils:main Jan 18, 2026
156 of 157 checks passed

jfinkels mentioned this pull request Jan 18, 2026

pr: don't convert to String when storing lines to print #10333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pr: uniformly scan for form feed and newline chars #10303

pr: uniformly scan for form feed and newline chars #10303

jfinkels commented Jan 18, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

sylvestre Jan 18, 2026

Uh oh!

jfinkels Jan 18, 2026

Uh oh!

sylvestre Jan 18, 2026

Uh oh!

jfinkels Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

pr: uniformly scan for form feed and newline chars #10303

pr: uniformly scan for form feed and newline chars #10303

Conversation

jfinkels commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 3.26%

Performance Changes

Footnotes

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

sylvestre Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

jfinkels Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

sylvestre Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

jfinkels Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jfinkels commented Jan 18, 2026 •

edited

Loading

codspeed-hq bot commented Jan 18, 2026 •

edited

Loading